Data released by a small arms survey concluded that, in 2018, the United States lead in firearms per 100 residents [1]. Reports show that civilian buying of guns in America has now increased the total number of owned guns to 393 million. Now, as they become cheaper, smaller, and more widely produced their availability for crime and murder has increased [2]. The purpose of this study is to analyze data on gun violence. Measures have been taken to predict and prevent gun violence and data can help determine if these measures are effective. Data can also be used to reveal new insights on trends and to form new systems for limiting gun violence. The goal of this investigation is to explore trends in American gun violence and see what can potentially be done about it.
In the process of exploration, it is important to generate hypotheses in order to guide our investigation. Brainstorming about potential correlations, one might expect the following to influence gun violence:
Firstly, it helps to get a sense of the general trend of the data before diving into more specific relationships. Therefore, to get started, we will plot out how the frequency of gun violence incidents has changed over time. The following code will retrieve the packages we need and then extract useful columns from the csv file.
library(tidyverse)
library(lubridate)
library(ggplot2)
library(rvest)
library(stringr)
library(leaflet)
csv_file <- "./gun_violence.csv"
gvtb_one <- read_csv(csv_file) %>% select(incident_id, date, state, city_or_county, latitude, longitude, participant_age, gun_type, participant_status)
Now its time to visualze the data. This code groups the number of gun incidents by month then plots it on a graph.
gvtb_one %>% group_by(month=floor_date(date, "month")) %>%
summarize(incidents=n()) %>% ggplot(mapping=aes(x=month, y=incidents)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Months", y="Incidents", x="Months")
Although the dataset includes the year 2013, it is obvious from brief observation that the data from this year had missing entries (immensly smaller than other years). As a result, it makes sense to remove this year in order to get better results.
gvtb_one <- gvtb_one %>% filter(date > as.POSIXct("2013-12-30"))
gvtb_one %>% group_by(month=floor_date(date, "month")) %>%
summarize(incidents=n()) %>% ggplot(mapping=aes(x=month, y=incidents)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Months", y="Incidents", x="Months")
A linear regression line helps us to get a general idea of the plot’s positive trend. It suggests that as time has passed from 2014, the number of gun violence incidents per month has increased as well.What is interesting about this graph is that the years seem to resemble a pattern, which peaks in summer months and lowers in winter months.
gvtb_one_year <- gvtb_one %>% mutate(year = floor_date(date, "year"))
gvtb_one_year %>% filter(year == as.POSIXct("2014-01-01")) %>% group_by(month=floor_date(date, "month")) %>%
summarize(incidents=n()) %>% ggplot(mapping=aes(x=month, y=incidents)) + geom_point() + geom_smooth(method=lm, formula = y ~ x + I(x^2)) + labs(title= "Incidents vs. Months (2014)", y="Incidents", x="Months")
gvtb_one_year %>% filter(year == as.POSIXct("2015-01-01")) %>% group_by(month=floor_date(date, "month")) %>%
summarize(incidents=n()) %>% ggplot(mapping=aes(x=month, y=incidents)) + geom_point() + geom_smooth(method=lm, formula = y ~ x + I(x^2)) + labs(title= "Incidents vs. Months (2015)", y="Incidents", x="Months")
gvtb_one_year %>% filter(year == as.POSIXct("2016-01-01")) %>% group_by(month=floor_date(date, "month")) %>%
summarize(incidents=n()) %>% ggplot(mapping=aes(x=month, y=incidents)) + geom_point() + geom_smooth(method=lm, formula = y ~ x + I(x^2)) + labs(title= "Incidents vs. Months (2015)", y="Incidents", x="Months")
In 2014, 2015, and 2016 there seems to be a parabolic curve, suggesting that as the season changes, the number of gun incidents do as well. However, the trend seems to go away. By 2016 the curve is only slight.
winter_2014 <- gvtb_one %>% group_by(month=floor_date(date, "month")) %>% filter((as.POSIXct("2013-12-01") <= month && as.POSIXct("2014-04-01") > month))
summer_2014 <- gvtb_one %>% group_by(month=floor_date(date, "month")) %>% filter((as.POSIXct("2014-06-01") <= month && as.POSIXct("2014-10-01") > month))
The contrast was clearly most pronounced in the year 2014. By filtering a range, we can separate out the two seasons and compare.
winter_map <- leaflet(winter_2014) %>% addTiles() %>% addMarkers(clusterOptions = markerClusterOptions()) %>% setView(lat=38, lng=-95, zoom=4)
winter_map
summer_map <- leaflet(summer_2014) %>%addTiles() %>% addMarkers(clusterOptions = markerClusterOptions()) %>% setView(lat=38, lng=-95, zoom=4)
summer_map
By comparing the numbers and colors of the clusters, we can observe how gun violence in the winter is less frequent. For further reading, this source goes into significant detail about this trend and the potential causes behind it.
winter_2015 <- gvtb_one %>% group_by(month=floor_date(date, "month")) %>% filter((as.POSIXct("2014-12-01") <= month && as.POSIXct("2015-04-01") > month))
summer_2015<- gvtb_one %>% group_by(month=floor_date(date, "month")) %>% filter((as.POSIXct("2015-06-01") <= month && as.POSIXct("2015-10-01") > month))
winter_map <- leaflet(winter_2015) %>% addTiles() %>% addMarkers(clusterOptions = markerClusterOptions()) %>% setView(lat=38, lng=-95, zoom=4)
winter_map
summer_map <- leaflet(summer_2015) %>%addTiles() %>% addMarkers(clusterOptions = markerClusterOptions()) %>% setView(lat=38, lng=-95, zoom=4)
summer_map
Doing this with 2015, we can see the contrast again.
winter_2016 <- gvtb_one %>% group_by(month=floor_date(date, "month")) %>% filter((as.POSIXct("2015-12-01") <= month && as.POSIXct("2016-04-01") > month))
summer_2016<- gvtb_one %>% group_by(month=floor_date(date, "month")) %>% filter((as.POSIXct("2016-06-01") <= month && as.POSIXct("2016-10-01") > month))
winter_map <- leaflet(winter_2015) %>% addTiles() %>% addMarkers(clusterOptions = markerClusterOptions()) %>% setView(lat=38, lng=-95, zoom=4)
winter_map
summer_map <- leaflet(summer_2015) %>%addTiles() %>% addMarkers(clusterOptions = markerClusterOptions()) %>% setView(lat=38, lng=-95, zoom=4)
summer_map
And finally with 2016 there is still a noticable difference, yet it is clearly less significant.
Second in our hypothesis is population density. Based on a section from the Journal of the American Statistical Association it is reasonable to believe that there is some relationship between how crowded it gets and how much crime occurs. Therefore, we should investigate if this also applies to the specific crime of gun violence.
url <- "https://www.governing.com/gov-data/population-density-land-area-cities-map.html"
popdtb <- url %>% read_html() %>% html_node("table") %>% html_table() %>% magrittr::set_colnames(list("location","population_density","population","land_area")) %>% dplyr::as_data_frame()
## Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
We start by acquiring some more data. This table of population density by U.S. city was scraped from the internet.
popdtb <- popdtb %>% mutate(population_density = strtoi(str_replace_all(population_density, ",", ""))) %>% mutate(population= strtoi(str_replace_all(population, ",", "")))
The previous pipeline helps to prepare the new data. We remove the commas from the population and population density columns so we can convert their values to integers. Also, while cleaning up the population density column, we get rid of outliers that were way to big to make sense relative to the rest of our data.
Now we must prepare for the merge. The code below first mutates the location columns so that they align with the matching values in the population density table. Once the merge occurs, we match up the population density and incident frequency by their location. Since our population density table doesn’t include smaller towns/cities, some locations are left with NA. So, we need to filter them out before we plot the graph.
gvtb_two <- gvtb_one %>% mutate(city_or_county = ifelse(str_detect(city_or_county, "\\(county\\)")==TRUE, substring(city_or_county, 1, nchar(city_or_county) - 9), city_or_county))
gvtb_two <- gvtb_two %>% mutate(location = paste(city_or_county, ", ", state, sep=""))
tb <- gvtb_two %>% group_by(location) %>% summarize(incidents = n())
tb_a <- merge(x = tb, y = popdtb, by = "location", all.x = TRUE)
tb_a %>% filter(population_density != "NA") %>% ggplot(mapping=aes(x=population_density,
y=incidents)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Population Density", y="Incidents", x="Population Density (Person/Square Mile)")
It appears there are signficiant outliers that make it more difficult to analyze the data. We can filter out values that are greaters than 3 times the standard deviation away from the mean.
popdtb <- popdtb %>% filter(population_density < mean(population_density) + 3 * sd(population_density))
tb_b <- merge(x = tb, y = popdtb, by = "location", all.x = TRUE)
tb_b %>% filter(population_density != "NA") %>% filter(incidents < 9000) %>% ggplot(mapping=aes(x=population_density,
y=incidents)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Population Density", y="Incidents", x="Population Density (Person/Square Mile)")
There is also one signficant outlier with an incident tally over 9000, so we can filter that out as well. The graph is surprising as it seems there isn’t much of a trend. Let’s look at the values before we make our decision.
popd_fit <- lm(incidents~population_density, data=tb_b)
popd_fit_stats <- popd_fit %>% broom::tidy()
popd_fit_stats
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 112. 36.1 3.10 0.00198
## 2 population_density 0.0230 0.00755 3.05 0.00240
With a p-value < 0.05, it seems that we can be statistically confident that there is an increase in gun_violence over population density. For every increment (+1) in the population density, we can see a 0.02 increase in the number of gun violence incidents.
The next hypothesis on our list is firearm legislation. The National Center for Health Research has invested a great deal of time trying to figure out if it does have an impact. Let’s see if we can find something with our data.
url <- "https://www.worldatlas.com/articles/us-states-by-population.html"
sptb <- url %>% read_html() %>% html_node("table") %>% html_table() %>% magrittr::set_colnames(list("rank","state","population")) %>% select(state, population) %>% dplyr::as_data_frame()
sptb <- sptb %>% mutate(population = as.numeric(str_replace_all(population, ",", "")))
This first table is scraped above to get population values for each state. This is crucial as the magnitude of crime incidents can be contributed to the state’s relative population size. Having these values will allow us to take that out of the equation. One assumption being made is that the populations remain static from 2014-2018. By holding every state to the same assumption, if states grow in population at fairly the same rate then this should not be a problem.
url <- "https://lawcenter.giffords.org/scorecard/#MT"
stb <- url %>% read_html() %>% html_node("#rankings-table") %>% html_table() %>% magrittr::set_colnames(list("law_strength","state","grade","rate", "rateper100")) %>% select(law_strength, state) %>% dplyr::as_data_frame()
The second table scraped above gives us an idea of state legislation. The Giffords Law Center has created rankings for each state based on how strong their firearm laws are.
In the following code we merge the two scraped tables with our modified data collected from the csv file. The District of Columbia doesn’t appear in our scraped tables so we should choose to ignore it.
tb_two <- gvtb_two %>% group_by(state) %>% summarize(incidents = n())
tb_two <- merge(x = tb_two, y = stb, by = "state", all.x = TRUE)
tb_two <- merge(x = tb_two, y = sptb, by = "state", all.x = TRUE)
tb_two %>% filter(state != "District of Columbia") %>% ggplot(mapping=aes(x=law_strength,
y=(incidents/population), na.rm=TRUE)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Law Strength", y="Incidents/Population of State", x="Law Strength")
By taking the original incidents reports and joining on their states’ respective populations and law strength we see that there does seem to be some relation between legislation and gun violence. Remember that strong law scores are represented by lower values.
tb_two_fit <- lm((incidents/population)~law_strength, data=tb_two)
tb_two_fit_stats <- tb_two_fit %>% broom::tidy()
tb_two_fit_stats
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.000715 0.0000994 7.20 0.00000000364
## 2 law_strength 0.00000401 0.00000342 1.17 0.246
Although by visualization it appears there is a connection, our pvalue > 0.05 means that we cannot be confident enough to relate a regression model to the data.
altered_stb <- stb %>% mutate(gun_laws=ifelse(law_strength <= 10, "strong", "Temp"))
altered_stb <- altered_stb %>% mutate(gun_laws=ifelse(law_strength >= 40, "weak", gun_laws))
altered_stb <- altered_stb %>% filter(gun_laws == "weak" | gun_laws == "strong")
In an another attempt to visualize a more significant contrast, we can separate out the top states and bottom states for firearm legislation. The code above separates our the top and bottom ten states.
The code below is then used to merge the tables with the specified states to find some kind of interaction.
tb_three <- gvtb_two %>% group_by(state, month=floor_date(date, "month")) %>% summarize(incidents = n())
tb_three <- merge(x=tb_three, y=altered_stb, by="state", all.x=TRUE)
tb_three <- tb_three %>% filter(gun_laws == "weak" | gun_laws == "strong")
tb_three <- merge(x=tb_three, y=sptb, by="state", all.x=TRUE)
tb_three %>% ggplot(mapping=aes(y=incidents/population, x=month, color=gun_laws)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Month", y="Incidents/Population of State", x="Month")
Now we can re-merge the data tables as before, but separate two groups of states. Although the slopes appear the same, the intercepts look different. This could mean that a drop point in legislation before 2014 was enough to separate the paths of the states.
Next up is peer pressure. Peer pressure is certainly a cause of petty crimes among the youth, yet group mentality can sometimes inspire escalation of crimes for any age. The following code manipulates the data given to extract a count for the people involved.
tb_four <- gvtb_two %>% filter(participant_age != "NA") %>% mutate(number_involved = lengths(strsplit(participant_age, "\\|\\|")))
tb_four <- tb_four %>% group_by(number_involved) %>% summarize(incidents = n())
tb_four %>% ggplot(mapping=aes(y=incidents, x=number_involved)) + geom_bar(stat="identity") + labs(title= "Incidents vs. Number Involved", y="Incidents", x="Number Involved")
Plotting a simple graph does not favor this hypothesis. It shows that individual gun incidents occured far more often than group events. It is important to point out that gun related suicides were not included in the data set. Although, this does not disprove that there is a group effect on crime.
The second to last item in our list of hypotheses is gun type. Exploring the types of guns used can be important to forming legislation around them. This Washington Post article goes in to depth to explain the restrictions set on different types of guns and attachments and how they differ. Here we will explore the differences that each gun has related to gun violence in order to claims about what legislation might be most effective.
tb_five_a <- gvtb_one %>% filter(gun_type != "NA")
tb_five_a <- tb_five_a %>% mutate(gun_type = str_match(gun_type, "\\d+::(\\w+\\.?\\w*\\s?\\[?\\w*\\-?\\w*\\]?).*")[,2])
tb_five_a <- tb_five_a %>% filter(gun_type != "Unknown" & gun_type != "Other")
tb_five <-tb_five_a %>% group_by(gun_type, month=floor_date(date, "month")) %>% summarize(incidents = n())
tb_five %>% ggplot(mapping=aes(y=incidents, x=month, color=gun_type)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Months", y="Incidents", x="Months")
## Warning in qt((1 - level)/2, df): NaNs produced
In the code above we start with the original gun violence csv data. From there we use a regex expression to pull out the specific gun type from each incident. By graphing this data we see that handguns have been leading in gun violence incidents seemingly every month. However, through visualization, it appears that handgun related incidents have formed a parabola shape over time and appear to be declining.
tb_five_fit_interaction <- lm(incidents~month*gun_type, data=tb_five)
tb_five_fit_interaction_stats <- tb_five_fit_interaction %>% broom::tidy()
tb_five_fit_interaction_stats
## # A tibble: 50 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -5.10 340. -0.0150 0.988
## 2 month 0.000395 0.0200 0.0197 0.984
## 3 gun_type12 gauge -97.3 395. -0.247 0.805
## 4 gun_type16 gauge 2.93 501. 0.00584 0.995
## 5 gun_type20 gauge -0.184 410. -0.000450 1.000
## 6 gun_type22 LR -289. 389. -0.743 0.458
## 7 gun_type223 Rem -311. 397. -0.783 0.434
## 8 gun_type25 Auto -5.69 398. -0.0143 0.989
## 9 gun_type28 gauge 6.10 3401. 0.00179 0.999
## 10 gun_type30-06 6.29 479. 0.0131 0.990
## # … with 40 more rows
Observing the regression metrics for an interaction model we see that the 9mm weapon has also been related to the increase in gun violence incidents over time.
tb_five_fit <- lm(incidents~month, data=tb_five)
anova(tb_five_fit)
## Analysis of Variance Table
##
## Response: incidents
## Df Sum Sq Mean Sq F value Pr(>F)
## month 1 78610 78610 11.306 0.0008023 ***
## Residuals 982 6827592 6953
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(tb_five_fit_interaction)
## Analysis of Variance Table
##
## Response: incidents
## Df Sum Sq Mean Sq F value Pr(>F)
## month 1 78610 78610 63.931 3.793e-15 ***
## gun_type 24 5148685 214529 174.468 < 2.2e-16 ***
## month:gun_type 24 530445 22102 17.975 < 2.2e-16 ***
## Residuals 934 1148462 1230
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From this analysis we see that the interaction model has a smaller sum squares value, leading us to believe that the type of gun has played a role in the increasing trend of gun violence incidents over the past 4 years. Another area we can look into is how deadly these guns have been. Looking at the data provided, we are given the status of the victims. By relating the gun used to the people killed in the incident, we can make claims about which is the most deadly for any given confrontation. Similar to the previous analysis, this can be important when forming legislation for specific weapons.
tb_six <- tb_five_a %>% filter(participant_status != "NA")
tb_six <- tb_six %>% mutate(death_percentage = str_count(participant_status, pattern="\\d+::Killed\\|\\|")/lengths(strsplit(participant_status, "\\|\\|")))
tb_six <- tb_six %>% group_by(gun_type) %>% summarize(death_rate = mean(death_percentage)) %>%
filter(gun_type == "25 Auto" | gun_type == "9mm" | gun_type == "Handgun" | gun_type == "7.62 [AK-47]" | gun_type == "Shotgun" | gun_type == "30-06")
tb_six %>% ggplot(mapping=aes(x=gun_type, y=death_rate)) + geom_bar(stat = "identity")+ labs(title= "Death Rate vs. Gun Type", y="Death Rate", x="Gun Type")
In the code above, first we filter out all values where the participant status was not confirmed in the report. Then we can count the number of people in the incident that were killed. By grouping by the gun type we can then pull apart which guns were the most deadly on average for the incidents they occured in. The list is long, so in addition, we can take out guns that don’t have tremendously high or low death rates. From this data we see that on average the 30-06 is the most deadly in its use. On the other hand, when used, the 25 Auto kills the fewest people. We can see that guns that grabbed attention in earlier analysis like the shotgun and handgun still fall pretty high in their deadliness.
Other than legislation, gun stores can also influence their accessibility. With more stores in a state, it should make sense for that state to experience more gun related incidents. On the other hand, registered gun stores follow procedures held by legislation, so having more gun stores might reflect a community that has more legally purchased guns. To analyze this we first need data on gun shops.
fileName <- './stores.txt'
stores <- readChar(fileName, file.info(fileName)$size)
stores <- str_split(stores, "\\n")
Above we download the txt file from the U.S. Bureau of Alcohol, Tobaco, Firearms, and Explosives. This gives us a record of every kind of firearm distributor and manufacturer with their address.
x <- data.frame("count" = 1:50, "state_ab" = c("AL", "AK", "AZ", "AR", "CA", "CO", "CT", "DE", "FL", "GA", "HI", "ID", "IL", "IN", "IA", "KS", "KY", "LA", "ME", "MD", "MA", "MI", "MN", "MS", "MO", "MT", "NE", "NV", "NH", "NJ", "NM", "NY", "NC", "ND", "OH", "OK", "OR", "PA", "RI", "SC", "SD", "TN", "TX", "UT", "VT", "VA", "WA", "WV", "WI", "WY"))
x <- x %>% mutate(count = 0)
Then, here we create a new data frame which will house the incoming data.
for (j in 3:80013) {
value <- which(grepl(str_split(stores[[1]][j], "\\\t")[[1]][15], x$state_ab))
x[value, "count"] <- x[value, "count"] + 1
}
For every entry in the txt file we use string split to extract the state value. Then we increment the count for that state. There are 80013 entries provided based off of the 2017 records.
x$state <- c("Alabama","Alaska","Arizona","Arkansas","California","Colorado","Connecticut","Delaware","Florida", "Georgia", "Hawaii","Idaho","Illinois","Indiana","Iowa","Kansas","Kentucky","Louisiana","Maine","Maryland","Massachusetts","Michigan","Minnesota","Mississippi","Missouri","Montana","Nebraska","Nevada","New Hampshire","New Jersey","New Mexico","New York","North Carolina","North Dakota","Ohio","Oklahoma","Oregon","Pennsylvania","Rhode Island","South Carolina", "South Dakota", "Tennessee", "Texas", "Utah","Vermont","Virginia", "Washington", "West Virginia","Wisconsin","Wyoming")
This step above helps conform the states to the gvtb table in order to have a successful join.
tb_seven <- gvtb_one %>% group_by(state) %>% summarize(incidents=n())
tb_seven <- merge(x = tb_seven, y = x, by = "state", all.x = TRUE)
Above we group the gvtb incidents by state once again. Then we merge the two tables so that we can see the relationship betwee gun stores and gun violence incidents.
tb_seven %>% filter(state != "District of Columbia") %>% ggplot(mapping=aes(x=count, y=incidents)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Store Frequency", y="Incidents", x="Store Frequency")
This data clearly shows some relationship between the number of firearm stores available and the incidents. Let’s take a close look.
tb_seven_fit <- lm(incidents~count, data=tb_seven)
tb_seven_fit_stats <- tb_seven_fit %>% broom::tidy()
tb_seven_fit_stats
## # A tibble: 2 x 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 1272. 718. 1.77 0.0829
## 2 count 2.16 0.347 6.23 0.000000112
With a p value < 0.05 we can be confident that with the addition of a new firearm distributor, our data shows the number of incidents to increase by 2. However, gun legislation should be prominent in stores that sell them. Because of this, we should look into how this relationship is impacted by the strength of the legislation in the state. If the number of stores increase, will the incidents increase the same even if their legislation strength is different.
The following code maps the incidents per month yet also works to separate the strength of the legislation.
tb_eight <- tb_three %>% group_by(state) %>% summarize(incidents_total = sum(incidents))
tb_eight <- merge(x = tb_eight, y = x, by = "state", all.x = TRUE)
tb_eight <- merge(x = tb_eight, y = tb_three, by = "state", all.x = TRUE)
tb_eight %>% filter(state != "District of Columbia") %>% ggplot(mapping=aes(x=count, y=incidents_total, color=gun_laws)) + geom_point() + geom_smooth(method=lm) + labs(title= "Incidents vs. Months", y="Incidents", x="Store Frequency")
Now we can see the interaction that the legislation data plays on the stores and incidents. From brief visualization, it is clear that as the number of stores increase, the rate at which the gun violence incidents increase is more for states that have stronger legislation. This is not at all what we were expecting, and so we should conduct an analysis on the two models before making claims.
tb_eight_fit_interaction <- lm(incidents_total~count*gun_laws, data=tb_eight)
anova(tb_seven_fit)
## Analysis of Variance Table
##
## Response: incidents
## Df Sum Sq Mean Sq F value Pr(>F)
## count 1 404099693 404099693 38.798 1.12e-07 ***
## Residuals 48 499937138 10415357
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(tb_eight_fit_interaction)
## Analysis of Variance Table
##
## Response: incidents_total
## Df Sum Sq Mean Sq F value Pr(>F)
## count 1 7446387885 7446387885 1928.15 < 2.2e-16 ***
## gun_laws 1 7825730907 7825730907 2026.37 < 2.2e-16 ***
## count:gun_laws 1 3798365671 3798365671 983.54 < 2.2e-16 ***
## Residuals 1016 3923733688 3861943
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With a significantly small Pr(>F), we can see that legislation is interacting with this model.
Our data showed us that gun violence incidents have increased since 2014. From the analysis we conducted we can contribute ideas to what may be related. Firstly, it was observed that seasonal changes may have an impact on gun violence. Winter months show considerably less gun violence. This would indicate that measures should be taken in summer months to handle the increase in gun incidents. Secondly, we looked into population density. It was found that, although small, population density does contribute to an increase in gun violence in that city. Next, we explored data related to legislation. Although it was shown by comparison that states with stronger gun laws had less gun violence relative to their population, it was not significant to impact the slope. This would imply that gun laws are not effective enough to change the trajectory of gun violence as we move into the future. Following that, we looked into how group mentality might have a role in encouraging gun violence. Inversely, it was found that the most frequent cases occurred through individuals. This means our data was not sufficient to make claims about the group dynamic. After that, we investigated different types of guns. By doing this we learned that there is an interaction between the type of gun used and the increasing incident frequency. As well, we found which models were least and most deadly during individual confrontations. Finally, we explored the frequency of gun distributors and manufacturers in states. These factors are heavily influenced by legislation, so it was interesting to find that stronger legislation did not result in reduced incidents of gun violence as the frequency of the stores climbed.
The purpose of this invesitagtion was to get a better grasp on the factors influencing gun violence in the United States. As guns become an ever more political debate, it is important to have data and analysis to back up claims. This project can hopefuly shed more light onto how our country should handle gun legislation and what it should include specifically.